On the Dirichlet Prior and Bayesian Regularization
نویسندگان
چکیده
Motivation & Previous Work: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. Regularization is essential when learning from finite data sets. It provides not only smoother estimates of the model parameters compared to maximum likelihood but also guides the selection of model structures. In the Bayesian approach, regularization is achieved by specifying a prior distribution over the parameters and subsequently averaging over the posterior distribution. In domains comprising discrete variables with a multinomial distribution, the Dirichlet distribution is the most commonly used prior over the parameters. This is because of the following two reasons: first, the Dirichlet distribution is the conjugate prior to the multinomial distribution and hence permits analytical calculations, and second, the Dirichlet prior is intimately tied to the desirable likelihood-equivalence property of network structures [1, 3]. The so-called equivalent sample size measures the strength of the prior belief. In [3], it was pointed out that a very strong prior belief can degrade predictive accuracy of the learned model due to severe regularization of the parameter estimates. In contrast to that, the dependence of the learned network structure on the prior strength has not received much attention in the literature, despite its relevance for the recovery of the ”true” network structure underlying the data.
منابع مشابه
Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work
Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...
متن کاملBayesian shrinkage
Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimension...
متن کاملCharacterizing the Function Space for Bayesian Kernel Models
Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator defined as the convolution of a kernel with a signed measure. Priors on the random signed measures correspond to prior distributions on the functions mapp...
متن کاملBayesian Multi-Task Compressive Sensing with Dirichlet Process Priors
Compressive sensing (CS) is an emerging field that, under appropriate conditions, can significantly reduce the number of measurements required for a given signal. Specifically, if the m-dimensional signal u is sparse in an orthonormal basis represented by the m × m matrix Ψ, then one may infer u based on n m projection measurements. If u = Ψθ, where θ are the sparse coefficients in basis Ψ, the...
متن کاملPosterior Consistency of the Silverman g-prior in Bayesian Model Choice
Kernel supervised learning methods can be unified by utilizing the tools from regularization theory. The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods. In this paper we pursue a Bayesian interpretation of sparsity in the kernel setting by making use of...
متن کامل